UNCERTAINTY QUANTIFICATION IN COMPLEX SYSTEMS 
USING APPROXIMATE SOLVERS 
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Abstract. This paper proposes a novel uncertainty quantification framework for computation- 
ally demanding systems characterized by a large vector of non-Gaussian uncertainties. It combines 
state-of-the-art techniques in advanced Monte Carlo sampling with Baycsian formulations. The key 
departure from existing works is the use of inexpensive, approximate computational models in a 
rigorous manner. Such models can readily be derived by coarsening the discretization size in the so- 
lution of the governing PDEs, increasing the time step when integration of ODEs is performed, using 
fewer iterations if a non-linear solver is employed or making use of lower order models. It is shown 
that even in cases where the inexact models provide very poor approximations of the exact response, 
statistics of the latter can be quantified accurately with significant reductions in the computational 
effort. Multiple approximate models can be used and rigorous confidence bounds of the estimates 
produced are provided at all stages. 
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1. Introduction and examples. Scientists have come to recognize the stochas- 
tic aspects inherent in several physical systems and processes and seek ways to quan- 
tify the probabilistic characteristics of their behavior. Their analysis tools arc usually 
restricted to elaborate legacy codes which have been developed over a long period 
of time and are generally well-tested. They do not however include any stochas- 
tic components and their alteration is commonly impossible or ill-advised. In many 
problems of engineering or physical interest the only feasible solution for uncertainty 
quantification is provided by non-intrusive methodologies. 

Traditionally, two approaches have have attracted most attention. On one hand 
methods based on polynomial chaos expansions (PC, [31])) and on the other tech- 
niques anchored around Monte Carlo simulations. PC models, although originally de- 
veloped as intrusive techniques (|15j). have grown into prominence in recent years with 
the development of non-intrusive, stochastic collocation approaches ([32], [12]). They 
arc based on a representation of the random input by a finite number of uncorrelated 
random variables (usually normally distributed) and orthogonal polynomials (usually 
Hermite). The solution or output process is expressed with respect to the same basis 
and the coefficients of the expansion are determined by calculating weighted resid- 
uals or using a collocation approach. Although mathematically elegant, PC-based 
approaches utilize a second-order matching (up to the autocovariance function) of 
the input processes which does not account for important higher order statistics that 
might affect the system's response. The computational effort grows with the number 
of random variables used to approximate the input which also adversely affects the 
accuracy, particular in the stochastic collocation version, as an interpolation in a very 
high dimensional space is required. 

Standard Monte Carlo simulations require a minimal implementation overhead 
as the coupling with existing deterministic solvers is trivial. Most often than not 
however, in systems of physical interest, each of the runs of the forward solver re- 
quires several CPU- hours and multiple processors. Even though the convergence rate 
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0(y/~N) (where N is the number of independent samples) is independent of the di- 
mensionality of the random input, it is sufficiently slow to constitute the method 
impractical or infeasible as several calls to forward solver have to be made to achieve 
good accuracy. Another disadvantage of classical Monte Carlo is that it does not 
directly provide confidence intervals for the estimates produced as those are usually 
based on asymptotic results (i.e. when N — > oo) and the central limit theorem ([28]). 
Recent years have seen significant progress in the development of advanced Monte 
Carlo techniques which employ evolutionary strategies in combination with Markov 
Chain Monte Carlo (MCMC) and Importance Sampling ( [TU [551 EH ) ■ In many cases 
this has led to algorithms which require 10 or 100 times less samples in order to 
produce estimates of the same accuracy (PJ). 

Despite this dramatic improvements, the associated computational effort can still 
be tremendous for systems of practical interest, where even a modest number of 100 
or 1000 runs can be infeasible. It is obvious that a new perspective is needed. In the 
author's opinion this can be achieved if analysis goes beyond the black-box solver as 
the only means of probing the problem of interest. Indeed in many situations, several 
other pieces of knowledge and structural elements of the problem at hand, are read- 
ily available but left uncxploited. For example, quite frequently the computational 
models of interest involve the solution of a system of PDEs using Finite Elements 
(FE) or Finite Differences (FD). These imply the spatio-temporal discretization of 
the governing differential equations and quite often the mesh sizes or time steps have 
to be particularly small in order to capture the salient features of the solution. With- 
out recourse to rigorous mathematical proofs, it is well-known that an FE solver that 
operates on a coarser spatio-temporal grid can give an approximate solution at a lower 
computational cost as the system of equations that need to be solved are smaller. The 
deviation from the "exact" (or reference) solution can be significant but in principle 
this approximate solver can be used to obtain some, inaccurate of course, informa- 
tion about our exact model. As it will be demonstrated in the sections to come, it 
is not important if the solutions of the approximate solver deviate significantly from 
the exact, but it suffices that they exhibit some sort of dependence. It is this de- 
pendence that we will exploit in a general and rigorous computational framework in 
conjuction with a few, carefully selected runs of the full, exact model. We will make 
no claims about the optimality of the approximate solvers selected. In fact as it will 
be demonstrated in the examples even crude approximations can yield impressive in- 
creases in computational efficiency. Furthermore, the framework proposed allows for 
the introduction of several such approximate models similar to the way one would 
elicit opinions from multiple experts before making a decision. In that respect even 
low-order, fast PC models can be utilized. Such an approach can be employed even 
in cases where no accurate computational model exists but rather we have to rely on 
experiments in order to collect the necessary information about the system. Since 
conducting experiments can be costly and time consuming it is desirable to minimize 
them by making use of approximate computational models that might be available. 

To that end we investigate Bayesian alternatives to classical uncertainty quan- 
tification techniques In particular we formulate a regression problem that establishes 
the connection between the response values from the approximate and exact solver. 
This is achieved using a flexible, non-parametric Bayesian model that employs a very 
efficient Sequential Monte Carlo inference algorithm. An added advantage of this ap- 
proach is that prior knowledge or expertise of the analyst regarding the relationship 
between approximate and exact solvers can be readily incorporated in the prior dis- 
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tributions. Once this relation is established, the posterior distribution can be readily 
used to obtain estimates and confidence intervals on the output statistics of interest. 
These can in turn be used to actively and adaptivcly improve the accuracy of the 
regression by performing runs of the expensive solver in regions that contribute most 
significantly to these uncertainties. An interesting extension involves using more than 
one approximate solvers simultaneously in order to improve the accuracy and com- 
putational efficiency of the model. This resembles mixtures of experts models that 
are commonly used in various statistical applications, as each approximate solver pro- 
vides some, generally incomplete, information about the exact model which is then 
aggregated in order to obtain the best possible estimate. 

2. Proposed Approach. Let (Sl,F,V) be a complete probability space, where 
is the event space, J- the er-algebra, and V the probability measure. Consider the 
following stochastic differential equation: 

C(u(z, t); €(w)) = /(*, t; z e V, t > (2.1) 

defined on the domain T> C K 9 (q = 1,2,3) with appropriate initial/boundary 
conditions which might also depend on the vector of uncertainties represented by 
: fl — > K d . We are particularly interested in the most general and difficult case 
where £ is of very large dimension (i.e. d » 1 ) and it should not or cannot be 
condensed using any of the standard dimension reduction techniques (e.g. PCA). Let 
u(z,t;£(u)) denote the solution process which satisfies Equation (|2.1|) for P-almost 
everywhere. We are interested in the statistics of the output itself or of a function 
thereof which we denote by y(£) : R d — > M. r emphasizing the dependence on the vector 
of input uncertainties £. We further postulate the existence of a forward solver of the 
linear/nonlinear equation in Equation (|2.ip that corresponds to a deterministic ver- 
sion of the differential operator C (for fixed £) . In general however one might consider 
reduced versions of the above problem (i.e. dependence only on time or space) or even 
systems (and associated computational models) which are not governed by SPDEs but 
nevertheless characterized by a high-dimensional vector of input uncertainties For 
illustration purposes we will restrict the presentation to the case that y is scalar (i.e. 
r = 1). Naturally y may depend on other deterministic parameters which are omitted 
for economy of notation. 

The input uncertainties £ arc characterized by a probability density 7T£ . In order 
for the problem to be well-posed, 7T£ need not be known analytically but it suffices to 
be able to draw samples from ir^. Our goal is to calculate statistics of the response, 
e.g.: 

Pr[yeA} = J « (2.2) 

where 1.4 is the indicator function of a ^-measurable subset A, or: 

E[h{y)\ = J %(£)) tt c (£) (2.3) 

where h is any 7r^-integrable function. 

Quite often the statistics of interest involve very rare events (i.e. Pr [y e A] < < 1] , 
as is the case for example in molecular dynamics simulations where transition to a 
local minimum of the free-energy landscape happens infrequently or in estimating 
reliability of mechanical components. In other cases we are interested in expectations 
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of multimodal functions h as is the case in nonlinear dynamical systems where small 
perturbations in the input £ can lead to significant differences in the (long-term) re- 
sponse. Due to the large variance of the integrands in Equations (|2.2[) or (|2.3p . several 
calls to the forward solver have to be made (to calculate the response y for various 
£'s) which imply a significant or insurmountable computational burden, particularly 
in cases where each of these calls imply several CPU-hours on multiple processors. 

To address these issues we postulate the existence of M approximate forward 
solvers x m (£) : M. d — > R r , for m = 1, . . . M, as those discussed in the introduction and 
in the numerical examples to follow. Each of those provides approximations to the 
output of interest at a fraction of the computational cost. The latter requirement is 
the key in increasing the overall computational efficiency in the proposed framework, 
whereas the former condition can be interpreted very loosely. In fact it is acceptable 
that the x rra 's provide very poor estimates (i.e. y(£) — x m (£) is relatively large) as long 
as there is some statistical dependence between them. In that sense x m might not 
even correspond to the same output quantities, although in such cases the selection of 
reasonable approximate solvers can be less straightforward. For the purposes of this 
work, x TO 's are viewed as (potentially) biased and partial predictors of the output 
y. Our goal is to quantify the information these predictors provide for the purposes 
of estimating statistics of y at a fraction of the computational cost. In particular, if 
x = (xi, X2, ■ ■ ■ , x m ). Equation (|2 . 2[) can be rewritten as: 

Pr[y eA]=E x [Pr[y G A | x]] 

= J Pr\y G A I x]vr x (x) dx 

J l A (y) p(y | x) dy^j tt x (x) dx (2.4) 

and Equation (|2.3[) : 

E[h(y)] = E x [E[h(y) | x]] 

= J E[h{y) | x]tt x (x) dx 

j Kv) p(y I x ) d yj M*) dx ( 2 - 5 ) 

Hence it is apparent that estimates of Pr[y G A] and E[h(y)] can be obtained as 
long as the density ^(x) = J 6(x — x(£)) d£ and conditional p(y | x) are known. 

Given that calls to the approximate solvers are computationally inexpensive in relative 
terms as it will be seen in the examples of section [3J ^(x) can be readily evaluated 
using direct or advanced Monte Carlo techniques as those discussed previously. The 
pivotal component is the conditional density p(y \ x) which probabilistically quantifies 
the information that the predictors x carry about y. 

Figure [2~T1 trivially illustrates the two extreme scenaria. On one hand y and x are 
statistically independent. In this case knowledge of x is completely useless in furnish- 
ing information about y and p{y \ x) = p{y) . Hence the proposed framework cannot 
offer any improvement. On the other extreme, there exists an injective, deterministic 
mapping between the two quantities, i.e. y = g(x) and therefore knowing x and its 
statistics translates straightforwardly to y since p(y | x) = S(y — g(x)). The proposed 
methodology is applicable to all cases except the one of independence between x and 
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Fig. 2.1. 

Critical to the feasibility of proposed framework is establishing a quantitative 
link between the exact y and approximate x outputs as described by p(y | x). This 
will be accomplished using computationally generated data that consist of pairs of 
{(xj = x(£j), yi — y(£i))}f = i obtained by running approximate and exact solvers for 
the same This is discussed in detail in the next 3 sub-sections. The task of utilizing 
the inferred models for the purposes of estimating Equation (j2.4[) or Equation (|2.5|) 
is discussed in sub-section 12.41 and illustrated in the examples of section [3] 

2.1. Hierarchical Bayesian model. We assume that the data have been rescaled 
so that Xj € [0, 1] M and consider regression models of the form: 

y{a i ) = y i = f{Ati)\9) + °Zi (2-6) 

where / is a function of the predictors x = (x\, . . . Xm) and model parameters 9, and 
Zi are i.i.d standard normal variates i.e. Zi ~ Af(0, 1) (if y € K r then /, Zi € W). 
Equation (|2 .6[) postulates that, given the model parameters 6, for an input £ 4 for 
which the outputs of the approximate models (^j), the target response y(£j) is 

normally distributed with mean /(x(£j);0) and standard deviation a, i.e.: 

!/i|x« i ),fl,<7~^(/(x(€ i );fl),a a J) (2.7) 

At first glance such a model seems highly restrictive as it is unlikely that y is normally 
distributed (given x and model parameters 6). For that purpose we adopt a Bayesian 
formulation in which the model parameters 6 are assumed random and equipped with 
a distribution. This allows us to actually formulate a family of such models (each 
corresponding to a particular 6) and even though conditionally on 6, y is normally 
distributed, marginally (when 6 are integrated out) non-Gaussian distributions can 
be considered. 

Bayesian formulations differ from classical statistical approaches (frequentist) in 
that all unknown parameters are treated as random. Hence the results of the infer- 
ence process are not point estimates but distribution functions. The basic elements of 
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Bayesian models are the likelihood function L(9) = p(data \ 9) which is a conditional 
probability distribution and gives a (relative) measure of the propensity of observing 
data for a given model configuration specified by the parameters 9. The likelihood 
function is also encountered in frequentist formulations where the unknown model 
parameters 9 are determined by maximizing L(6). This could be thought as the prob- 
abilistic equivalent of deterministic optimization techniques commonly used in such 
problems. The second component of Bayesian formulations is the prior distribution 
p(0) which encapsulates in a probabilistic manner any knowledge/information/insight 
that is available to the analyst prior to observing the data. Although the prior is a 
point of frequent criticism due to its inherently subjective nature, it can prove ex- 
tremely useful in the context of problems examined as it provides a mathematically 
consistent vehicle for injecting the analyst's insight (whenever it is available) with re- 
gards to the relation between the exact and approximate models. The combination of 
prior and likelihood based on Bayes' rule yields the posterior distribution n(9) which 
probabilistically summarizes the information extracted from the data with regards to 
the unknown 9 : 



Hence Bayesian formulations allow for the possibility of multiple solutions - in fact 
any 9 in the support of the likelihood and the prior is admissible - whose relative 
plausibility is quantified by the posterior. Credible or confidence intervals can be 
readily estimated from the posterior which quantify inferential uncertainties about 
the unknowns. 

The crucial ingredient is of course the prior specification, not only in terms of the 
functional form of p(9) but primarily in terms of the structural characteristics of the 
relation between y and x that is implied in Equation (|2.6|) . It is easily understood, 
that any parameterization that depends on a finite number of 9 will be restrictive 
no matter how large the family of models that it contains. Furthermore, in order 
to be consistent with the principle of parsimony, prior models should make as few 
assumptions as possible and allow their complexity to be inferred from the data. 
To satisfy the aforementioned desiderata and overcome the shortcomings of existing 
approaches, we propose the use of nonparametric priors ( [301121] ). As the term can be 
misleading, we note that this does not imply lack of parameters but rather that the 
number of parameters is not a priori fixed and can change as the data dictates. At 
the core of such representations, lie simple basis functions, whose shape and location 
are controlled by a few parameters. The key unknown is the cardinality of the model, 
i.e. the number of such terms that are needed to provide a good interpretation of the 
data. Consider the expansion: 



tation and </> • associated parameters. Expression (|2.9p is motivated by the representer 
theorem of Kimeldorf and Wahba ([H]), which states that the solution to the problem 
of minimizing a goodness-of-fit loss function subject to a Reproducing Kernel Hilbert 
Space norm penalty lies in a subspace represented as in Equation (|2.9|) . Overcomplete 
representations as in Equation (|2.9j) have been advocated because they have greater 
robustness in the presence of noise, can be sparser, and can have greater flexibility in 
matching structure in the data ( [5D1 [TJ [2T] ) . One possible selection for the functional 



n(9) = p(9 | data) 



p(data | 9) p{9) 
p(data) 



oc p(data | 9) p{9) 



(2.8) 
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form of A'j, that also has an intuitive parameterization, is isotropic, Gaussian kernels: 

Kixify) = (x 3 ,r 3 )) = exp{-r 3 - || x - u 3 || 2 } (2.10) 
The parameters t 3 directly correspond to the scale of variability of f(x). Large Tj's 
imply narrowly concentrated fluctuations and large values slower varying fields. The 
center of each kernel is specified by the location parameter Uj . 
The parameters of the prior model adopted consist of: 

• k: the number of kernel functions needed, 

• {cij}j =0 , the coefficients of the expansion in Equation (|2.9[) . Each of those 
can be a scalar or vector depending on the dimensionality of the exact output 
V- 

• { T j}j=i the precision parameters of each kernel which pertain to the scale of 
the unknown field(s), and 

• the center locations of the kernels which are points in [0, 1] . 

Let 6k = {{o-j}j—o: { T j}j=i> { u j}j=i} <= ®fc denote the vector containing all 
the unknown parameters and 6 = (k,9k)- If k is also assumed unknown and al- 
lowed to vary, then the dimension of Ok is variable as well and &k — (R k+1 ) r x 
(R + ) fc x ([0,1] ) . For example, in the case of two approximate solvers (xi,Xi) 
(M = 2) and a scalar y (r = 1), 9k is of dimension (Je + 1) + k + (2k) = 1 + 4k, 
i.e. 0fc = M. 1+k x (K + )' c x [0, l] 2k . In accordance with the Baycsian paradigm, all 
unknowns are considered random and arc assigned prior distributions which quantify 
any information, knowledge, physical insight, mathematical constraints that is avail- 
able to the analyst before the data is processed. Naturally, if specific information 
about the relation between exact y and approximate x outputs is available it can be 
reflected on the prior distributions. We consider prior distributions of the following 
form (excluding hyperparameters): 

p(k, {a 3 } k =0 , {T 3 } k =11 {x 3 } k =1 ) oc p(k) 

x p({a 3 } k =0 | k) 
x K{t;}*U I k) 

x p({xj}$ =1 )) (2.11) 

In order to increase the robustness of the model and exploit structural dependence 
we adopt a hierarchical prior model (|13j). 

2.2. Prior Distribution. Pivotal to the robustness and expressivity of the 
model is the selection of the model size, i.e. of the number of kernel functions k 
in Equation (|2.9p . This number is unknown a priori and in the absence of specific 
information, sparse representations should be favored. This is not only advantageous 
for computational purposes, as the number of unknown parameters is proportional to 
k, but also consistent with the parsimony of explanation principle or Occam's razor 
([HI [27j [24]). For that purpose, we propose a Poisson prior for k: 

p(k\\) = e- x — fc = 0,l,...,oo (2.12) 
kl 

For computational purposes, the aforementioned distribution is truncated beyond 
k m ax- The latter is selected based on computational limitations and defines the sup- 
port of the prior. This prior allows for representations of various cardinalities to be 
assessed simultaneously with respect to the data. As a result the number of un- 
knowns is not fixed and the corresponding posterior has support on spaces of different 
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dimensions as discussed in more detail in the sequence. In this work, an exponen- 
tial hyper-prior is used for the hyper-parameter A to allow for greater flexibility and 
robustness i.e. p(A | s) = s cxp{— A s}. After integrating out A we obtain: 

p(k | s) ex ^ + for k = 0, 1, . . . , k max (2.13) 

The parameters {rj}^ =1 control the scale of variability in the relation between y 
and x. If prior information about this is available then it can be readily accounted 
for by appropriate prior specification. In the absence of such information however 
multiple possibilities exist. We assumed Tj are a priori independent i.e. p({Tj}* =1 ) = 
Y\j=iP( T j) an d a Gamma(a T ,b T ) prior was used for each Tj'. 

p{{Tj}$ =1 I k, a Tl b T ) = f[ l-r"- 1 expC-Mj-) (2.14) 

This has a mean a T /b T and coefficient of variation l/^/a7- Diffuse versions can be 
adopted by selecting small a T . A non- informative prior p(Tj) cx I/tj arises as a special 
case for a T = 2 and b T — which is invariant under rescaling. Furthermore, it offers 
an interesting physical interpretation as it favors "slower" varying representations (i.e. 
smaller r's). In order to automatically determine the mean of the Gamma prior, we 
express b T = fija T where fij is a location parameter for which an Exponential hyper- 
prior is used with a hyper-parameter a M i.e. p(fij) = —e~ fl i' a >' . Integrating out the 
Hj's leads to following prior: 

k ifc i 7 \ TT r ( a T + l) a?" 1 1 / n1c x 

P{{ Tj } u i *. ^ = n-^ (2 - 15) 

For the coefficients a.j a multivariate normal prior was adopted: 

{a^ =0 \k,a 2 a ^N(0,a 2 a I k+1 ) (2.16) 

where Ik+i is the (fc + 1) x (k + 1) identity matrix. The hyper-parameter er 2 which 
controls the spread of the prior is modeled by the standard inverse gamma distribution 
Inv — Gamma(ao,bo). It can readily be marginalized leading to the following prior 
for dj's: 

1 r(ao + fc+1 ) 
p{{aj)U I «o,&o) = (27r)(fc+1)/2 - — 2 a 0+(fc+1)/2 (2-17) 

For the unknown kernel center locations Uj, a uniform prior in [0, 1] M was used. 
Naturally if prior information is available about subrcgions with significant fluctua- 
tions this can be incorporated in the prior. 

Based on the aforementioned equations, the complete prior model is given by: 

p(9 | s, a r ,a M , a ,b ) = ^ + ^ fc +i 

ArK + i) a? i i 
x II 



x 



r(a T ) T j«r-i) a, (a r r i + a^ 1 )K+ 1 ) 

1 r( ao + ^) 

( 27r )(*+l)/2 A , fc ^a +(fc+l)/2 

"0+2 2^j=o a j 



(2.18) 
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Given n data pairs, (xj,yj)" =1 the likelihood p(yi :n | x 1: „,0) is: 

p(yi:n | X.l: n ,6) = JJ^(j/< | Xj, 0) 

11 1 " 



?:=i 



A Gamma(a, b) prior was used for the variance <r~ 2 of the Gaussian error in Equation 
1)2.6(1 . which is conjugate to the likelihood above and can be readily marginalized 
resulting to the following expression: 



T(a + n/2) 



L n {6) = p(0 | (x 1: „, y 1:n )) = - — V 1 _Z j _^ (2.20) 



where T(z) = J ^i 2-1 e~* is the gamma function. 

The combination of the prior p{0) with the likelihood L n (6) corresponding to n 
data points, give rise to the posterior density 7r n (0) which is proportional to: 

Jr„(0) =p„(0 I (x 1: „,y 1: „)) cx L n (0) p{0) (2.21) 

Even though several parameters have been marginalized from the pertinent ex- 
pressions, the corresponding posteriors can be readily be obtained, or rather be sam- 
pled from, once the posteriors 7r n (0) has been determined. Of particular interest for 
prediction purposes is the variance a 2 of the error term (Equation (|2.6p ). From Equa- 
tion (|2. 19)1 and the conjugate prior model adopted for a 2 , it can readily be shown 
that the conditional posterior is given by a Gamma distribution: 

7T n (cr" 2 , 6) = p{cr~ 2 , 9 | (xi :n , yi :n )) 

= n n (<j- 2 I 8) 7r n (0 | (xi : „, y 1:n )) (2.22) 

and: 

7r„((7" 2 | 6) =p((T~ 2 | 6, (Xi : „,2/i:„)) 

= Gamma (a + |, b + ^Ll^J^ll^ (2 . 23 ) 

In the context of Monte Carlo simulation, this trivially implies that once samples 6 
from 7r„ have been obtained, samples of a~ 2 can also be drawn from the aforemen- 
tioned Gamma. 

It is worth pointing out, that Equation ()2.21|) defines a sequence of posterior den- 
sities with support on ujj™^{fc} x 0^. Each tt„ corresponds to n data points. It 
is easily understood that for small datasets, i.e. small n, the effect of the likelihood 
function L n will be subdued and the the associated posterior 7r ra will have fewer modes 
as it is dominated by the prior. As more data points are added and n increases the 
contribution of the likelihood becomes more pronounced and the posterior will po- 
tentially exhibit more idiosyncratic characteristics. As a result the task of identifying 
these posteriors becomes increasingly more difficult for larger n. It is this feature that 
we propose of exploiting in the next section in order to increase the accuracy and 
improve on the efficiency of the inference process. 
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2.3. Bayesian Computation - Determining the Posterior . The posterior 
defined above is analytically intractable. For that reason. Monte Carlo methods 
provide essentially the only accurate way to infer 7r„. Traditionally Markov Chain 
Monte Carlo techniques (MCMC) have been employed to carry out this task f|30[fT|V 
These are based on building a Markov chain that asymptotically converges to the 
target density (in this case ir r ) by appropriately defining a transition kernel. While 
convergence can be assured under weak conditions ( [22\ I29| ) , the rate of convergence 
can be extremely slow and require a lot of likelihood evaluations. Particularly in cases 
where the target posterior can have multiple modes, very large mixing times might be 
required. In this work we propose a recursive inference algorithm based on Sequential 
Monte Carlo techniques (SMC, [231 1X0] ) that ingests the data one at a time or in larger 
batches and independently of the order of presentation. The sequential incorporation 
of data points introduces a tempering effect in the sense described in the previous 
paragraph. As a result, the global problem of identifying a potentially multi-modal 
posterior is decomposed to a series of easier, tractable problems. More importantly, 
the inferences made can be readily updated if more data becomes available. As 
with Markov Chain Monte Carlo methods (MCMC), in SMC samplers the target 
distribution(s) need only be known up to a constant and therefore do not require 
calculation of the intractable integral in the denominator in Equation (|2.8[) . The basis 
of the approximation is a set of random samples (commonly referred to as particles) , 
which are propagated using a combination of importance sampling, resampling and 
MCMC-based rejuvenation mechanisms ( [HI [Z] ) - Each of these particles is associated 
with an importance weight which is proportional to the the posterior value of the 
respective particle. These weights are updated sequentially along with the particle 
locations. Hence if {9^\ w$}f = i represent N such particles and associated weights 
for distribution 7r„(0) then: 

TV 

7r„(0)«£ WW 6 g w(0) (2.24) 

?:=i 

where W$ = v$/Y%=i 

Wn^ are the normalized weights and 8 g a) (.) is the Dirac 

function centered at 9$. Furthermore, for any function h{6) which is 7r„-integrable 
(00): 

TV 

53 w n ) K e n) ~> / W) *n(0) dO almost surely (2.25) 

i=i ^ 

In order to facilitate the transition between two successive posteriors 7r ra and Tr„+i 
(particularly for small n), we can introduce a series of bridging distributions, based on 
a modified annealing scheme. In particular, if ttq is the prior p(6) (Equation (|2.18[) ) 
we define a family of artificial, auxiliary distributions 7r„. 7 (<?) as follows: 

7r n , 7 (0) cx L nn (0) p{0) (2.26) 

based on the modified likelihood: 

W*) = r (o + (n + 7 )/2) -™ 7 6 [0,1] 

(HiEL(^/(x 1 ;9)) 2 + 7( ! / n+1 -/(x n+1 ;e))r M 

(2.27) 
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where 7 plays the role of reciprocal temperature. Trivially for 7 = we recover ir n and 
for 7=1, 7r )l+1 . The role of these auxiliary distributions is to bridge the gap between 
7r„ and 7r„ + i and provide a smooth transition path where importance sampling can be 
efficiently applied. In this process, inferences based on n data points are transferred 
and updated to conform with additional (n + l) th datum. Starting with a particulate 
approximation for no(0) = p(6) (which trivially involves drawing samples from the 
prior with weights Wq^ =1), the goal is to gradually update the importance weights 
and particle locations in order to approximate the target posteriors n n . 

We propose an adaptive SMC algorithm, that extends existing versions ([7J [5]) 
in that it automatically determines the number of intermediate bridging distribu- 
tions needed. In this process we are guided by the Effective Sample Size ESS = 
V ^2iLi(^s+i) 2 wn i cn provides a measure of degeneracy in the population of par- 
ticles. Let s denote the number of intermediate bridging distributions between 7r n 
and 7r„ + i and 7 S the associated reciprocal temperature. If ESS S is the ESS of the 
population after the step s, then in the most favorable scenario that the next bridging 
distribution 7r„ !7s+1 is very similar to 7r„. 7s , then ESS s +i should not be that much 
different from ESS S . On the other hand if that difference is pronounced then ESS s +i 
could drop dramatically. Hence in determining, the next auxiliary distribution, we 
define an acceptable reduction in the ESS, i.e. ESS s +i > C ESS S (where C < 1) an d 
prescribe 7,5+1 (Equation (|2.26[) ) accordingly. The proposed Adaptive SMC algorithm 
is summarized in Table [2~T1 

Table 2.1 

Basic steps of the Adaptive SMC algorithm proposed 

Adaptive SMC algorithm: 

1. Initialize population {0^, WQ\}fL 1 where 9q are i.i.d draws from the 

prior ttq and u^q = 1 (ESSq = 0). Set I = and s = and 70 = 0. 

2. For I < n: 

a) Set s = s + 1. 

b) Reweigh: If w^l^s) = — '' 7s ^ are the updated 

weights as a function of 7 S then determine "f s £ (7^-1, 1] so 
that the associated ESS S = ^ESS s ~i (the value ( = 0.95 was 
used in all the examples). Calculate w^l for this 7 S . 

c) Resample: If ESS S < ESS m i n then resamplc (the value 
ESS m i n = N/2 was used in all the examples). 

d) Rejuvenate: Use an MCMC kernel Pi, s {-, ■) that leaves 7T; j7s in- 
variant to perturb each particle of\_ l — > 6^, l \ 

e) The current population {O^fl, w^ s }fL 1 provides a particulate ap- 
proximation of 7T/ 7s in the sense of Equations (|2.24p . (|2.25[) . 

f) If 7 5 = 1 set I = I + 1, Ofl = ef} l s , wfl = w^lq, s = and 
70 = 



The role of the Reweighing step is to correct for the discrepancy between the 
two successive distributions in exactly the same manner that importance sampling 
is employed. The Resampling step aims at reducing the variance of the particulate 
approximation by eliminating particles with small weights and multiplying the ones 
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with larger weights. The metric that we use in carrying out this task is the Effective 
Sample Size (ESS) defined earlier. If this degeneracy exceeds a specified threshold, 
resampling is performed. As it has been pointed out in several studies (0), fre- 
quent resampling can deplete the population of its informational content and result 
in particulate approximations that consist of even a single particle. Throughout this 
work ESSmin = N/2 was used. Although other options are available, multinomial 
resampling is most often applied and was found sufficient in the problems examined. 

A critical component involves the perturbation of the population of samples by a 
standard MCMC kernel in the Rejuvenation step as this determines how fast the tran- 
sition takes place. Although there is freedom in selecting the transition kernel P s {.,.) 
(the only requirement is that it is ~k^ 1s -invariant), there is a distinguishing feature 
that will be elaborated in the next sub-section (see I2.3.1j) . The target posteriors ir n 
(as well as the intermediate bridging distributions in Equation (|2.26[) ) live in spaces of 
varying dimensions as previously discussed. Hence an exploration of the state space 
must involve trans- dimensional proposals. Pairs of such moves can be defined in 
the context of Reversible-Jump MCMC (RJMCMC , [H]) such as adding / deleting a 
kernel in the expansion of Equation (|2.9[) . or splitting /merging kernels (see 12. 3. IT) . 

It should be noted that the framework proposed is directly parallelizable, as the 
evolution (reweighing, rejuvenation) of each particle is independent of the rest. The 
particulate approximations obtained at each step, provide a concise summary of the 
posterior distribution based on the respective forward solver. This can be readily 
updated in the manner explained above, if more data become available, i.e. more 
runs of the approximate and exact solver are invoked. 

An advantageous feature of the proposed framework is that the confidence in 
the estimates made can be readily quantified by establishing posterior (or credible) 
intervals from the particulate approximations (Equation (|2.24[1 ). It is these credible 
intervals (or in general measures of the variability in the estimates such as the posterior 
variance) that can guide adaptive acquisition of data. Since we want to minimize calls 
to the exact solver y, we can utilize these inferences in order to perform runs in regions 
that will be most informative of the sought output and therefore make near-optimal 
use of the computational resources available. This will be discussed in more detail in 
the numerical examples. 

2.3.1. Trans-dimensional MCMC. As mentioned earlier, a critical compo- 
nent in the SMC framework proposed is the MCMC-based rejuvenation step of the 
particle locations 9. It should be noted that the kernel P s (-,-) in the rejuvenation 
step (Step 3 of the SMC algorithm) need not be known explicitly as it does not en- 
ter in any of the pertinent equations. It is suffices that it is 7Ti2 i7s -invariant which 
is the target density. For the efficient exploration of the state space, we employ a 
mixture of moves which involve fixed dimension proposals (i.e. proposals for which 
the cardinality of the representation k is unchanged) as well as moves which alter the 
dimension k of the vector of parameters 9. We consider a total of M = 7 such moves, 
each selected with a certain probability as discussed below. Of those, four involve 
trans-dimensional proposals which warrant a more detailed discussion. 

It is generally difficult to design proposals that alter the dimension significantly 
while ensuring a reasonable acceptance ratio. For that purpose, in this work we 
consider proposals that alter the cardinality k of the expansion by 1 i.e. k' = k — 1 
or k' = k + 1. We adopt the the Reversible- Jump MCMC (RJMCMC) framework 
introduced in |16j according to which such moves are defined in pairs in order to 
ensure reversibility of the Markov kernel (even though the reversibility condition is 
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not necessary, it greatly facilitates the formulations). We consider two such pairs of 
moves, namely birth-death and split-merge. Let a proposal from (k,6) to (k',0') that 



(see last paragraph 



increases the dimension i.e. k = k + 1 and 6 £ 
of sub-section 12. 2[) . Let p(fc — > k') the probability that such a proposal is made 
(user specified) and p(k' — > fc) the probability that the reverse, dimension-decreasing 
proposal is made. In order to account for the rrik+i — mk difference in the dimensions 
of 9 and 6', the former is augmented with a vector u £ n m k+i-m k drawn f rom a 
distribution q(u). Consider a differential and one-to-one mapping h : W r,lk+1 — > K mfc+1 
that connects the three vectors as 6' = h(9,u). Then as it is shown in |16j . the 
acceptance ratio of such a proposal is: 



min < 1 , 



7r 12ns (0')p(k ^ k') 1 
7ri2, 7s (0)p(fc' -> k) q(u) 



de 1 



8(6, u) 



(2.28) 



where 



de' 



8(0, u) 



is the Jacobian of the mapping h. Such a proposal is invariant w.r.t. 

the density 7Ti2, 7s . Similarly one can define, the acceptance ratio of the reverse, 
dimension-decreasing move: 



Tr^^Mfe' -> fc) 



^ I 1 ' 7T 12l7 .((90p(ft fc 7 ) 



90' 



0(0, u) 



(2.29) 



In the following we provide details for the reversible pairs used in this work. 
Birth-Death: In order to simplify the resulting expressions, we assign the following 
probabilities of proposing one of these moves purth — c m»n{l, P ^p(t) } = c 3+1 
(from Equation (|2.13[) ) and Pdeath = c rnin{l, P ^p^ } = c (from Equation (|2.13p ). 
The constant c is user-specified (it is taken equal to 0.2 in this work). Obviously if 
k = k rnax , p blr th = and if k = 0, p de ath = 0. 

For the death move: 

• A kernel j ' (1 < j ' < k ) is selected uniformly and removed from the represen- 
tation in Equation (12.91) . 

• The corresponding aj is also removed. 
For the birth move: 

• A new kernel fc + 1 is added to the expansion while the existing terms remain 
unaltered. 

• The associated amplitude afe+i is drawn from Af(0, erf) (the variance cr| is 
equal to the average of the squared amplitudes aj over all the particles at the 
previous iteration) 

• The associated scale parameter Tk+i is drawn from the prior, Equation (|2.15[) 

• The associated kernel location i/fc+i is also drawn from the uniform prior, 
Equation ([2T8]) . 

Hence the vector of dimension-matching parameters u consists of u = (au+i, Tfc+i, a:fc+i) 
and the corresponding proposal q(u) is: 



q(u) 



1 1 



2tt 0"4 



T(a T ) 



'fe+i 



exp(-6 r r fe+ i) 



(2.30) 



It is obvious that the Jacobian of such a transformation is 1. 

Split-Merge These moves correspond to splitting an existing kernel into two or 
merging two existing kernels into one. Similarly to the birth-death pair, they alter 
the dimension of the expansion by 1 and arc selected with probabilities p sp ut = 
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and Pmerge = c. For obvious reasons, p sp ut — if fc = fc max and p merge = if fc < 1. 
Consider first the merge move between two kernels ji and In order to ensure a 
reasonable acceptance ratio, merge moves are only permitted when the (normalized) 
distance between the kernels is relatively small and when the amplitudes 
relatively similar. Specifically we require that the following two conditions are met: 

<S X \ ajl -a j2 \<S a (2.31) 



(the values S x = S a = 1 were used in this work). Two candidate kernels are selected 
uniformly from the pool of pairs satisfying the aforementioned conditions. The pro- 
posed kernels j\ and ji are removed from the expansion and are substituted by a new 
kernel j with the following associated parameters: 



31 32 



(2.32) 



- ( _2Z2_ + (233) 



This ensures that the average value of the previous expansion (with j\ and 
j'2) in Equation (|2.9p when integrated in R d is the same with the new (which 
contains j in place of ji and j'2) 



' n ± 1/32 (2.34) 



The split move is applied to a kernel j (selected uniformly) which is substituted 
by two new kernels j\, j2- In order to ensure reversibility, kernels ji and J2 should 
satisfy the requirements of Equation (|2.31[) and the application of a merge move in 
the manner described above, should return to the original kernel j. There are several 
ways to achieve this, corresponding essentially to different vectors u and mappings h 
in Equation (|2.28|) . In this work: 

• A scalar u T is drawn from the uniform distribution U[0, 1] and rj x = v-ttJ 1 
and t~ = (1 — m t )t _1 . This ensures compatibility with Equation (|2.32|) . 



• A vector u x is drawn uniformly in the ball of radius R where R - 
The center of the new kernels are specified as Vj 1 = Uj — u x and fj 2 = 
i/j + u x . This ensures compatibility with the first of Equation (|2.31|) as well 
as Equation (|2.34[) . 

• A scalar u a is drawn from the uniform distribution {/[—%-, %■]. The ampli- 
tudes of the new kernels are determined by = a — u a and a J2 = a + u a , 

where a = a+ "j^^ /1 ^ — . This ensures compatibility with the second of 

Equation (|2.31[) as well as Equation (|2.33|) . 
The vector of dimension-matching parameters u (in Equation (|2.28j) ) consists of 
u = (u T , u Xl u a ) and the corresponding proposal q(u) is a product of uniforms in the 
domains specified above. After some algebra, it can be shown that the Jacobian of 
such a transformation is 2 M+1 a ,, T — rr . 1 
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(b) Split-Merge moves 



(a) Birth-Death moves 

Fig. 2.2. Trans- dimensional RJMCMC proposals 



The remaining three proposals, involve fixed-dimension moves that do not change 
the cardinality of the expansion but rather perturb some of the terms involved. In 
particular, we considered updates of the amplitude a,j, scale Tj or location Xj of a 
kernel j selected uniformly (naturally, in the case of the amplitudes, the constant 
ao (Equation (|2.9p ) is also a candidate for updating). Each of these three moves 
is proposed with probability \{p blr th + P death + P split + Pmerge) = 
particular: 



l 

s+l 



1). In 



1. Update ctj — > a^: 
and perturbed as: 



A coefficient Oj (in Equation (|2.9jl ) is uniformly selected 



a! 3 =a 3 +o x Z ,Z~ N{0,1) 

2. Update Tj — > rj: A scale parameter Tj (in Equation 
lected and perturbed as: 



Z~ jV(0,l) 



(2.35) 

) is uniformly sc- 
(2.36) 



(this ensures positivity of rj) 
3. Update ia, — > u'f. A location i/j e [0, 1] M (in Equation (|2.9p ) is uniformly 
selected and perturbed as: 



(T 3 Z, Z = (Z 1 ,...,Z d ), Z~7V(0,1) 



(2.37) 



The acceptance ratios are calculated based on the standard MCMC formulas using 
7Ti2.7 s as the target density. It should be noted that the variances in the random walk 
proposals are adaptively selected so that the respective acceptance rates are in the 
range 0.2 — 0.4. As it is well-known (chapter 7.6.3 in [53]) adaptive adjustments of 
Markov Chains based on past samples can breakdown crgodic properties and lead to 
convergence issues in standard MCMC contexts. In the proposed SMC framework 
however, such restrictions do not apply as it suffices that the MCMC kernel is invari- 
ant. This is an additional advantage of the proposed simulation scheme in comparison 
to traditional MCMC. 



2.4. Prediction - Calculating statistics of exact solver. We return to the 
original problem of estimating statistics of the output y of the exact solver using 



16 



P.S. KOUTSOURELAKIS 



the relationship with the outputs x of the approximate solvers. The aforementioned 
Bayesian model is able to not only provide estimates but also quantify the level of 
confidence one can assign to the predicted outcome. Let (9, cr) denote a pair of 
values for the model parameters in Equation (|2.6p . For these parameter values, the 
conditional density p(y | x) can be obtained based on the regression model adopted 
and Equation (|2.7|) : 

11 r 1 



p(y |x)»p(y|x,0,<7) = -^=-exp{--^(y - /(x;0)) 2 } (2.38) 

V27T cr 2cr z 

which upon substitution in Equation (|2.4[) yields: 

Pr[y€A;0,a] = J q A (x;9,cr) tt x (x) dx (2.39) 



where: 



q A (x; 9,<r) = J l A (y) p(y | x, 9, a) dy (2.40) 

The latter expresses the probability that exact response y £ A for fixed approximate 
response x (and model parameters). Consider for example the case that we are inter- 
ested in calculating a probability of exceeding a threshold yo £ M, i.e. A — (ya, +oo). 
Then: 

qA ( X ;e,*) = *( f ^- yo ) (2.41) 



where <&(z) = f z —?=e ™ dw is the standard normal CDF. 

V ' J -00 V27T 

Given a number of training data (xi m , yi :n ), the plausibility of various parameter 
values is quantified by the posterior Tr n (9) = p(9 \ (xi :n , yi-.n))- Hence, one can 
obtain point estimates of q A based for example on the maximum a posteriori values 
(MAP) 9 map = argmax ir n (9) or the posterior mean E n [9] = J 9 tt„(9) d9. More 
importantly, due to its dependence of 9 (and cr), qA is also random and its distribution 
can be determined from the posterior distribution of 9 and cr. This distribution 
therefore indirectly depends on the training data upon which posterior inferences 
were based. In particular, one can estimate the posterior mean of q A for the case of 
Equation (|2.41|) as follows: 



q A ( X ) = E n [q A ] = J $ ( /(X;0 J V ° ^j n n (9,a- 2 ) d9 da 



(from Equation ([2~22]) ) 

0~r> 



where the particulate approximation {9^\Wn^}fLi °f n n obtained through the pro- 
posed SMC scheme was used and {<J$}f = i are corresponding draws from the condi- 
tional posterior in Equation (|2.23p . Equation (|2.42|) expresses how, on average, based 
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on the available data (x 1:ra , yi- n ) the probability of interest depends on the approxi- 
mate solver values x. This can provide an estimate of the probability of interest by 
substitution in Equation (|2.39|) . 

Furthermore, the weighted samples g^(x) = $ ^^ x ' 9 " i ^~ yn ^ provide a partic- 
ulate approximation of the distribution of <7a(x), Vx. p-Quantiles <^4 iP (x) can be 
readily estimated as: 

JV 

Pr[q A (x) < ^(x)] » W n ] H (<$( x ) - lA, P (x)) = P (2-43) 
i=l 

where if (.) is the Hcaviside function. These can readily yield confidence bounds for 
the probability of interest when substituted in Equation (|2.39[) . More importantly 
perhaps, these bounds can serve as the basis for active learning i.e. determining 
where more training samples need to be generate in order to refine the estimates 
produced. Consider for example the p = 1% and p = 99% quantiles <?_4,o.oi(x) and 
9.4,0.99 (x). then based on Equation (|2.39p ) an ordering of x can be constructed based 
on: 

(9^,0.99 (x) - gu,o.oi (x)) 7T x (x) (2.44) 

Thus x (or regions in the x-space) for which the aforementioned value is large con- 
tribute more in the uncertainty about the probability of interest Pr[y £ A] and 
therefore could serve as the best candidates for generating additional training pairs 
(Xn+i, Vn+i)- This is particularly important in the cases considered where each run 
for the evaluation of the exact response y can be extremely expensive and therefore 
optimal use of the computational resources is crucial. These additional training sam- 
ples can be readily incorporated based on the SMC scheme adopted and updates of 
the particulate approximation that reflect the new data can be produced. These in 
turn can lead to updates in the estimates made as well as the confidence bounds. 
Naturally other measures of variability of qu(x), such as the variance (or coefficient 
of variation) can be used in place of (<L4,o.99( x ) — <L4,o.oi(x)) in Equation (|2.44[) . The 
variance for each x can be estimated as follows: 

N 2 

Var n [q A ]^J2 W n ] (<Z^ « - <u(x)) (2.45) 

i=l 

For estimates of expectations of functions h(.) of the exact output y as in Equation 
(|2.5p . the same results apply if in place of g^(x) we use: 

g h (x) - J h(y)p(y | x) dy (2.46) 

3. Numerical results. In the examples presented, the following values for the 
hyperparameters of the prior model were used (Equation (|2.18|) ): 

• kmax = 100 and s = 1.0 (Equation (|2~T3]0 

• a t au = 1.0 (Equation (JUTJ)) and a p = 0.01 (Equation (|2~T5]0 

• a = 1.0 and b = 1.0 (Equation ([237) ) 

• a = 2. and b = 1. x 10~ 6 (Equation (f2~T9|) ) 

Furthermore, N = 1,000 particles were employed in the adaptive SMC scheme de- 
scribed in section 12.31 As in most systems of practical interest, the computational 
cost is dominated by the number of calls to the forward solver, we report results on 
computational effort in terms of the number of runs of the exact solver. 
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3.1. Example 1. The first example involves a problem from fracture mechanics 
where it is known that small scale stochastic fluctuations can have a significant im- 
pact in the macroscale response. We consider cohesive interface of unit length that is 
pulled apart in Mode I fracture and is modeled with cohesive zone elements (Figure 
13. 1|) . These are line (or surface in 3D) elements which are located at the interface 
and govern the separation process in accordance with a cohesive law. The concept 
of cohesive laws was pioneered by Dugdale (Qj]) and Barenblatt ([3]) in order to 
model fracture processes and has been successfully used in a Finite Element setting 
by several researchers ( [33J SI US] ) ■ According to these models, fracture is initiated 
when the interface traction exceeds a threshold T c and progresses gradually as the 
separation takes place across an extended crack tip or cohesive zone and is resisted 
by cohesive tractions. We assume herein a simple constitutive law relating interface 
traction-separation as seen in Figure 13.11 Under monotonic loading the normal inter- 
face traction decays as T = T c ^1 — j-^j for 6 < S c and T = for 5 > S c . The fracture 
energy G c is given by G c — T c 5 c /2. The constitutive rate equations are: 

( -T c j- if 8 > 
f=l ?-f if<j<0 (3-1) 
{ if S > S c 

At the microstructural level, the cohesive properties exhibit random variability. We 
adopt the following simple random field descriptions for the model parameters: 

T c (z) =T Q + AT Ui(z) 

G c (z)=G + AG (pU 1 (z) + U 2 (z)) ze {0,1} (3.2) 

where: 

U i (z)=2*(h i (z))-1, <= 1,2 (3.3) 

and hi(z) is a zero- mean, unit variance Gaussian process with autocorrelation Rh(Az) = 
E [h(z)h(z + Az)] = exp{— ■^^} ($ is the standard normal CDF). The parameter z$ 
controls the length scale of heterogeneity and it was taken equal to 0.1. The field 
h2(z) was assumed to represent a Discretized white noise process. The parameter p 
controls the autocorrelation between the two properties and was taken equal to 0.9 
which implies that areas with high T c are more likely to have high G c as well. Also, 
the values T = 1.0, AT Q = 0.5, G c = 10~ 3 and AG Q = 0.5 x 10~ 3 were used. 

The exact solver was a detailed finite element model consisting of 1 , 000 cohesive 
elements of equal length with properties assigned based on the values of T c (z) and 
G c (z) at their midpoint. The mesh size is much smaller than the length scale of 
variability of the cohesive properties as determined by the correlation length zq defined 
above. The output of interest y was the fracture energy released when a uniform 
separation 8 = 0.5 x 10~ 3 was applied at the interface. The quasi-static, nonlinear 
calculation of the output of interest was carried out by applying separation increments 
of 0.5 x 10~ 6 in order to capture accurately the traction-separation history (i.e. a total 
of 1,000 iterations). 

We considered a single approximate solver with a much coarser mesh consisting 
of only 10 cohesive elements of equal length, i.e. each macro-element takes the place 
of 100 micro-elements of the exact solver. The assigned cohesive strength T c in each 
macro-element was set equal to the minimum of the cohesive strengths of the 100 
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Fig. 3.1. Cohesive interface - Cohesive Law: T c denotes ultimate interfacial tension (when 
the stress reaches T c the cohesive element is activated) , S c denotes the ultimate separation interface 
(when the separation reaches 8 C the interface tension becomes zero) and G c denotes the fracture 
energy which is equal to the area under the tension-separation curve. 



corresponding micro-elements, and the fracture energy G c equal to the average of 
the fracture energies of the the 100 corresponding micro-elements. In addition a 
displacement increment of 0.5 x 10 -5 (in contrast to the 0.5 x 10 -6 for the exact solver) 
was used in order to carry out the quasi-static, nonlinear integration of the equations 
of equilibrium and the constitutive model (Equation ()3.1|) ). As a result of these 
crude simplifications the approximate solver was 1, 069 faster than the exact. Figure 
3.21 compares the approximate and exact solver output prediction where significant 
discrepancies can be observed (e.g. when x = 4 x 10~ 4 , y « 5 x 10~ 4 , i.e. a 25% 
difference). It is also observed that the mapping from x to y is one-to-many, as the 
coarse model crudely smears some of the fine details that affect the response. 
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Fig. 3.2. Exact y vs. Approximate x response 



Iii order to assess the performance of the method proposed we considered the 
event y > yo = 5.601 x 10~ 4 which corresponds to a probability 10~ 3 . This was 
found using an advanced simulation procedure based on Sequential Monte Carlo as 
in the first example ([UEI]). The computational effort amounted to 1,500 calls to 
the exact solver and the coefficient of variation (c.o.v) of the estimate was 0.23 (based 
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Fig. 3.3. Cumulative distribution function for the approximate solver output x 

on an asymptotic, and rather optimistic, bound in [2]). This roughly implies that 
with probability 0.95, the actual probability of the event of interest is in the interval 
[0.55 x 10 -3 , 1.5 x 10 -3 ]. For the same c.o.v and standard Monte Carlo 18, 500 calls 
to the exact solver would have been needed. 

The density n x of x = x\ was estimated using the same advanced Monte Carlo 
scheme that required 5, 000 calls to the approximate solver. The corresponding CDF 
is depicted in Figure 13.31 for probabilities as low as 10~ 5 . Note that due to the 
reduced computational effort associated with calls to the approximate solver, the 
computational time for this task amounted to (approximately) 5 runs of the exact 
solver. 

The crucial task, that of estimating p(y | x\) involves the nonparamctric Baycsian 
regression model discussed previously. Figure 13.41 depicts posterior statistics of the 
regression model for various training sample sizes and Figure 13.51 the posterior mean 
and posterior quantiles of g^i(x) based on Equations (|2.42|) and 12.431 for various sam- 
ple sizes. Table [3~T1 summarizes the estimates based on the posterior mean (x) 
and confidence bounds established with 9.4,0.01 ( x ) and 9.4, 0.99 ( x )- It is noted that 
even with a small number of calls to the exact solver, the estimates obtained are rea- 
sonably good and most importantly the lower and upper confidence bounds always 
include the reference value. This is particularly important in engineering purposes as 
the analyst can decide whether these confidence bounds are satisfactory and if not 
perform additional calls to the exact solver in order to refine them. As the number 
of training samples increases the posterior mean approaches the true value and the 
credible intervals become more concentrated. A complete view of the the cdf of the 
exact output y is depicted in Figure 3.6 based on 150 training samples. 

3.2. Example 2. We consider a problem in nonlinear solid mechanics that il- 
lustrates the capabilities of the proposed methodology and the the significant im- 
provements in computational efficiency even when very crude approximate solvers 
are selected. A random two-phase medium consisting of two elastic-perfectly-plastic 
materials occupies the unit square in 2D and is subjected to plane stress loading 
conditions. It was assumed that the matrix phase (white in Figure [3~7|l had a yield 
stress cr^lel" X = 0-1 an d the inclusion phase (black in Figure 13 . T[) a yield stress 
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(e) 150 training samples 
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Fig. 3.4. Posterior mean o//(x; 0) and and posterior density of a (Equation 12. 61) ) for various 
training sample sizes 



Table 3.1 

Estimates of Pr[y > yo = 5.601 X 10 -4 ] and computational effort (the latter is measured in 
number equivalent number of calls to the exact solver 



Number 


Posterior 


Posterior 


Posterior 


Computational 


of samples 


mean 


quantile 1% 


quantile 99% 


Effort 


10 


6.36 x 10~ 3 


5.91 x 10~ 4 


7.17 x 10~ 2 


15 


50 


1.75 x 10~ 3 


7.39 x 10~ 4 


3.55 x 10~ 3 


55 


150 


1.01 x 10~ 3 


7.07 x 10~ 4 


1.42 x 10~ 3 


155 
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(c) 150 training samples 

Fig. 3.5. Posterior mean and quantiles for Pr[y > 5.601 X 10 — 4 | x] based on various sample sizes 



reference 
posterior mean 
posterior quantile 1% 
posterior quantile 99% 




Fig. 3.6. Posterior mean and quantiles for Pr[y > yo], Vyo based on 150 training samples 



a% yieid ilon = 1-0 The same elastic properties were assumed for both phases (elastic 
modulus E = 1.0 and Poisson's ratio v = 0.3) and the von-Mises yield criterion was 
used. The stochasticity in the problem is introduced by the distribution of the black 
discs of diameter d = 0.0859375 whose centers are assumed to follow a Poisson point 
process on the unit square. The intensity of the point process is selected so that the 
volume fraction of the inclusion phase is 65%. This was intentionally chosen to be 
close to the percolation threshold of 68% ([M]) in order to have realizations where the 
inclusion phase was connected and disconnected. In the former case, the load-bearing 
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capacity (in the horizontal direction) of the specimen is high as the strong, inclu- 
sion phase forms a network that carries the load, whereas in the disconnected case, 
the load-bearing capacity is lower and determined by the yield stress of the matrix 
phase. Hence the stochastic geometry completely determines the mechanical behav- 
ior of the system. The vector of uncertain parameters £ ( Equation l|2.2[) ) consists 
of the number of inclusion disks and the coordinates of their centers. As the former 
is a random variable (following a Poisson distribution) the corresponding 7r^(^) has 
support in spaces of varying dimension and non-zero probability for arbitrarily large 
d where £ £ W l . It should be noted however that on average there are 181 disks and 
dim(£) = 1 + 2 181 = 363. Usual dimension reduction techniques based on second 
order properties would be extremely misleading in this case as higher order statistics 
of the random medium (relating to connectivity) dominate mechanical response. 




(a) Connected inclusion phase (high strength) (b) Disconnected inclusion phase (low strength) 

Fig. 3.7. Example 2 

The exact model corresponds to a Finite Element solver with 128 x 128 elements 
i.e. 11 elements per inclusion diameter and m 33, 000 dof, of the governing equations 
of equilibrium: 



V-<r = 

where a is the Cauchy stress tensor and : 

b = c : (e - e p ) 



(3.4) 



(3.5) 



the constitutive rate equations with e and e p being the total and plastic strain tensors 
respectively. 

The yield stress was assumed constant within each element and equal to the yield 
stress of the the phase occupying the majority of its area. The response of interest y 
was the ultimate strength of the specimen in the horizontal direction and the average 
computational time was w 700sec on a single CPU. 

A single approximate model was used (i.e. M = 1) which corresponds to Finite 
Element solver on a uniform 8x8 mesh and 128 dof. Constant yield stress was 
assigned to each of the 64 elements based on the log-average of the yield stress within 



each element (Figure 3.8(b)). Hence if T> e is the subdoman occupied by element e, 
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its yield stress <J y i e id,e — ^PlfzJ - ! Jx> 1°S { a yieid( s )) ds}. The computational time 
for calculating the ultimate strength x = x\ was 0.15 sec, i.e. « 4,700 times faster 
than the exact model. It is obvious that such a solver introduces a significant error as 
it does not sufficiently resolve the governing PDEs and smears out the connectivity 
details that determine the ultimate strength of the specimen. Furthermore, the log- 
average rule used to determine the yield stress of the elements does not represent a 
consistent upscaling scheme of the material model. This discrepancy can be seen in 
Figure [3T9l which depicts 100 pairs of approximate x vs. exact response y. It is also 
observed that the mapping from x\ to y is one-to-many as the approximate model 
blurs some of the important microstructural details. 
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(a) Exact model - 128 X 128 mesh (700 sec) (b) Approximate model - 8 X 8 mesh (0.15 sec) 
Fig. 3.8. Exact vs. Approximate solvers 
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Fig. 3.9. Exact y vs. Approximate x response 



In order to compare the performance of the method we considered the event 
V > Va = 0.521 which corresponds to a probability 10~ 3 . This was found using 
an advanced simulation procedure based on Sequential Monte Carlo as discussed in 
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detail in ([551 [HI EH])- The computational effort amounted to 1,500 calls to the 
exact solver and the coefficient of variation (c.o.v) of the estimate was 0.23 (based 
on an asymptotic, and rather optimistic, bound in [2]). This roughly implies that 
with probability 0.95, the actual probability of the event of interest is in the interval 
[0.55 x 10~ 3 , 1.5 x 10~ 3 ]. It should be noted that to achieve the same c.o.v with 
standard Monte Carlo, 18, 500 calls to the exact solver would have been needed. 

The density ir x of x = x\ was estimated using the same advanced Monte Carlo 
scheme that required 5,000 calls to the approximate solver. The corresponding cdf is 
depicted in Figure 13". 101 for probabilities as low as 10 -5 . Note that due to the reduced 
computational effort associated with calls to the approximate solver, the effective 
computational time for this task amounted to (approximately) 1 call to the exact 
solver. 

As with the previous example Figure [3. 1 ll dcpicts posterior statistics of this model 
for various training sample sizes. Figure [3.12l depicts the posterior mean and posterior 
quantiles of <2u( x ) based on Equations (|2.42[) and (|2.43|) for various sample sizes. 
Table [3~2l summarizes the estimates based on the posterior mean q_4(x) and confidence 
bounds established with <3U.o.oi( x ) and <L4.o.99( x )- It is noted that good estimates of 
the actual output statistic can be obtained at a fraction of the computational cost. 
Furthermore good confidence bounds are readily obtained for all sample sizes. The 
same good quality in the results is also observed in the estimates of the whole cdf of 
the exact output which is depicted in Figure [3. 131 for 100 training samples. 

Table 3.2 

Estimates of Pr[y > yo = 0.521] and computational effort (the latter is measured in number 
equivalent number of calls to the exact solver 



Number 


Postcrioi 




Posterior 


Posterior 


Computational 


of samples 


mean 




quantile 1% 


quantile 99% 


Effort 


10 


1.47 x 10" 


2 


2.33 x 10~ 4 


3.80 x 10- 1 


11 


20 


6.24 x 10" 


•A 


3.56 x 10" 3 


1.90 x 10~ 2 


21 


30 


2.64 x 10" 


3 


3.50 x 10~ 4 


8.55 x 10~ ;i 


31 


50 


2.64 x 10- 


3 


4.25 x 10~ 4 


5.23 x 10" 3 


51 


100 


1.06 x 10 _ 


3 


4.75 x 10~ 4 


2.14 x 10" 3 


101 
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We examined the same problem using M = 2 approximate solvers. In addition 
to the approximate solver discussed earlier we considered a second FE model with 



the same mesh of 64 elements (Figure 3.8(b) I. The yield stress a y i e id,e for each 
element e was now determined by averaging i.e. <J y i e id.e — v^i J-rj Pyieid{s) ds. This 
is again a non-consistent rule with the exact constitutive model and would yield 
approximate solutions. Furthermore due to the concavity of the log-function and 
Jensen's inequality the assigned yield stresses a y i e id, e were smaller in the first model 
and as a result the predicted approximate outputs x\ < x^. 

Figure [3.141 depicts 50 triplets (x\,X2,y) comparing the exact output with the 
approximate solutions provided by the two reduced models. It is expected that the 
addition of the second model will yield more information about y that can be readily 
taken into account by the Bayesian framework presented. The joint pdf 7r a (351,0:2) 
was estimated using 5,000 calls to each solver (i.e. total 10,000) which due to their 
reduced computational cost amounted to the equivalent of 2 runs of the exact 
solver. 

Figure 13.151 depicts posterior statistics of the regression model for 50 training 
samples. Finally Figure [5.161 illustrates the complete cdf of the exact output y based 
on the same number of training samples. The computational effort for obtaining this 
result is equivalent to 52 calls to the exact solver (i.e. 2 for estimating n x and 50 
for obtaining the training data. The addition of the second predictor X2 offers a 
significant improvement w.r.t. Figure [3.131 not only in terms of accuracy but also in 
terms computational efficiency since the latter result required effectively 101 calls to 
the exact solver. 

4. Conclusions. The majority of systems of physical and engineering interest 
are characterized by a large number of uncertainties. These are non-Gaussianly dis- 
tributed and quite frequently their higher-order properties play a decisive role in the 
statistics of the response/output. While Monte Carlo techniques provide the only 
general method for uncertainty quantification in such systems and despite the signif- 
icant progress of recent years, they might still require an infeasible number of calls 
to the forward solver. The present paper introduced a Bayesian framework where 
outputs from approximate, inexpensive solvers can be rigorously utilized in order to 
accelerate the solution process. We made use of a flexible, non-parametric Bayesian 
model and a general SMC-based inference engine that is able to establish a quantita- 
tive link between approximate and exact solver. This can in turn be used to produce 
estimates for the output statistics of interest and rigorous confidence bounds. While 
this capability was not utilized in the examples presented, these credible intervals can 
assist in minimizing the number of calls to the exact solver by performing runs in 
selected regions that will be most informative. Furthermore it offers the capability 
of utilizing multiple approximate solvers and it opens the door for designing or opti- 
mizing systems in the presence of uncertainties based on establishing functions of the 
output statistics with respect to the design variables. 
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FlG. 3.11. Posterior mean of /(x; 6) and posterior density a for various training sample sizes 
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FlG. 3.12. Posterior mean and quantiles for Pr[y > 0.521 | x] based on various sample sizes 




FlG. 3.13. Posterior for Pr[y > yo] Vyo based on 100 training samples 
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FlG. 3.14. Exact y vs. Approximate x = (xi,X2) response 
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FlG. 3.15. Posterior mean of /(x; 0) and posterior density a for 50 training samples 
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FlG. 3.16. Posterior mean and quantiles for Pr[y > yo], Vj/o based on 50 training samples 



